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Abstract — In this paper we study the covering numbers of 
the space of convex and uniformly bounded functions in multi- 
dimension. We find optimal upper and lower bounds for the e- 
covering number of C([a, b]'', B),in the ip-metric, 1 < p < cxd, in 
terms of the relevant constants, where d > 1, a < b G K, B > 0, 
and C{[a, b]'', B) denotes the set of all convex functions on [a, b^ 
that are uniformly bounded by B. We summarize previously 
known results on covering numbers for convex functions and also 
provide alternate proofs of some known results. Our results have 
direct implications in the study of rates of convergence of em- 
pirical minimization procedures as well as optimal convergence 
rates in the numerous convexity constrained function estimation 
problems. 

Index Terms — convexity constrained function estimation, em- 
pirical risk minimization, Hausdorff distance, Kolmogorov en- 
tropy, Lp -metric, metric entropy, packing numbers. 



I. Introduction 

EVER since the work of IT], covering numbers (and their 
logarithms, known as metric entropy numbers) have been 
studied extensively in a variety of disciplines. For a subset F 
of a metric space p), the e-covering number M {J^, e; p) 
is defined as the smallest number of balls of radius e whose 
union contains Covering numbers capture the size of the 
underlying metric space and play a central role in a number 
of areas in information theory and statistics, including non- 
parametric function estimation, density estimation, empirical 
processes and machine learning. 

In this paper we study the covering numbers of the space of 
convex and uniformly bounded functions in multi-dimension. 
Specifically, we find optimal upper and lower bounds for the 
e-covering number M{C{[a,b]^, B),e; Lp), in the Lp-metric, 
1 < p < oo, in terms of the relevant constants, where d > 1, 
a,b e R, B > 0, and C{[a,b]'',B) denotes the set of all 
convex functions on [a,b]'^ that are uniformly bounded by 
B. We also summarize previously known results on covering 
numbers for convex functions. The special case of the problem 
when 0? = 1 has been recently established by Dryanov in [2i 
Theorem 3.1]. Prior to 1 2 1 , the only other result on the covering 
numbers of convex functions is due to Bronshtein in [[33 (see 
also [4 Chapter 8]) who considered convex functions that 
are uniformly bounded and uniformly Lipschitz with a known 
Lipschitz constant under the L^o metric. 

In recent years there has been an upsurge of interest 
in nonparametric function estimation under convexity based 
constraints, especially in multi-dimension. In general function 
estimation, it is well-known (see e.g., [5 [-[8[) that the covering 
numbers of the underlying function space can be used to 
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characterize optimal rates of convergence. They are also useful 
for studying the rates of convergence of empirical minimiza- 
tion procedures (see e.g., [jO), ifTOl ). Our results have direct 
implications in this regard in the context of understanding the 
rates of convergence of the numerous convexity constrained 
function estimators, e.g., the nonparametric least squares es- 
timator of a convex regression function studied in ifTTI . llT2l : 
the maximum likelihood estimator of a log-concave density in 
multi-dimension studied in ifTSI - lfTSl . Also, similar problems 
that crucially use convexity/concavity constraints to estimate 
sets have also received recent attention in the statistical and 
machine learning literature, see e.g., ||T6l . ifTTI . and our results 
can be applied in such settings. 

The paper is organized as follows. In Section |II] we set 
up notation and provide motivation for our main results, 
which are proved in Section [III] In Section |IV| we draw 
some connections to previous results on covering numbers for 
convex functions and prove a related auxiliary result along 
with some inequalities of possible independent interest. 

II. Motivation 

The first result on covering numbers for convex functions 
was proved by Bronshtein in [3[, who considered convex 
functions defined on a cube in that are uniformly bounded 
and uniformly Lipschitz. Specifically, let C{[a,b]'^, B,r) de- 
note the class of real-valued convex functions defined on 
[a, b]'^ that are uniformly bounded in absolute value by B and 
uniformly Lipschitz with constant F. In Theorem 6 of i[3l, 
Bronshtein proved that for e sufficiently small, the logarithm 
of M{C{[a, b]'^, B, F), e; ioo) can be bounded from above and 
below by a positive constant (not depending on e) multiple 
of (T'^l'^ . Note that the L^a distance between two functions / 
and g on [a,&]''is defined as ||/-5||oo := sup^g[„ - 
5(^)1- 

Bronshtein worked with the class C([a, fo]'', B, F) where the 
functions are uniformly Lipschitz with constant F. However, 
in convexity-based function estimation problems, one usually 
does not have a known uniform Lipschitz bound on the un- 
known function class. This leads to difficulties in the analysis 
of empirical minimization procedures via Bronshtein's result. 
To the best of our knowledge, there does not exist any other 
result on the covering numbers of convex functions that deals 
with all d > 1 and does not require the Lipschitz constraint. 

In the absence of the uniformly Lipschitz constraint 
(i.e., if one works with the class C([a, fo]'^, i?) instead of 
C([a, 6]'', i?, F)), the covering numbers under the L^o metric 
are infinite. In other words, the space C([a, 6]'', i?) is not totally 
bounded under the metric. This can be seen, for example, 
by noting that the functions 

:= max (0, 1 - 2H) , for i G [0, 1], 
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are in C([0, 1], 1), for all j > 1, and satisfy 

II/, - Mloo > \fj{2-'^) - /fc(2-^-)| = 1 - 2^-" > 1/2, 
for all j < k. 

This motivated us to study the covering numbers of the 
class C{[a,b]'^, B) under a different metric, namely the Lp- 
metric for 1 < p < oo. We recall that under the Lp-metric, 
1 < p < oo, the distance between two functions / and g on 
\a, b]'^ is defined as 



i/p 



ll/-5llp 



\f{x)-g{x)\Pdx 



Our main result in this paper shows that if one works with 
the Lp-metric as opposed to ioo, then the covering numbers 
of C([a, 6]'', i?) are finite. Moreover, they are bounded from 
above and below by constant multiples of e^'^/^ for sufficiently 
small e. 

III. Lp-COVERING NUMBER BOUNDS FOR C([a, b]'^, B) 

In this section, we prove upper and lower bounds for the 
e-covering number of C{[a,bY ,B) under the Lp-metric, 1 < 
p < CO. Let us start by noting a simple scaling identity that 
allows us to take a = 0, 6 = 1 and B = 1, without loss of 
generality. For each / e C([a, bY, B), let us define / on [0, l]*^ 
by f{x) := f{al + (6 - a)x)/B, where 1 = (1, . . . , 1) e M''. 
Clearly / e C([0, 1]'', 1) and, for 1 < p < ex.. 



BP 

lxe[os]'' 



f{x) - g{x) 



dx 



f{y) - Bg 



y-al 



dy. 



for g e C([0, 1]'', 1). It follows that covering / to within e in 
the Lp-metric on [a,b]'^ is equivalent to covering / to within 
{b - a)-'^/ve/B in the Lp-metiic on [0, 1]''. Therefore, for 

1 < p < oo, 

A/(C([a, b]\ B), e; Lp) = M(C([0, 1]^ 1), e'; Lp), (1) 
where e' := {b - ay^'Pe/ B. 

A. Upper Bound for M{C{[a, bY, B), e; Lp) 

Theorem 3.1: Fix 1 < p < oo. There exist positive 
constants c and ep, depending only on the dimension d and p, 
such that, for every _B > and 6 > a, we have 

logAf(C([a,6]^i?),6;Lp)<c(^-^^^^-p^) ^ 

for every e < eoB{b — aY^^. 

The main ingredient in our proof of the above theorem is an 
extension of Bronshtein's theorem to uniformly bounded con- 
vex functions having different Lipschitz constraints in different 
directions. Specifically, for B e (0,oo), G (0,oo] and 
a, < b, fort = M C ijltiKb,]; B;ri, . . . ,rd) 

denote the set of all real-valued convex functions / on the 



rectangle [ai,6i] x • • • x [ad,bd] that are uniformly bounded 
by B and satisfy: 

|y(xi, • . • , Xi — i^ Xi^ . • . , Xd^ 

-/(xi, . . . . . ■,Xd)\ < Ti\xi - Uil (2) 



for every i 



l,...,d; x^.Ui e [ai.bi] and Xj € 



[aj,bj] for j 7^ i. In other words, the function x t-^ 
f{xi,...,x,^i,x,Xi+i,...,Xd) is Lipschitz on [at, hi] with 
constant F^ for all xj S [aj, bj],j ^ i. 

Clearly, the class C{[a,bY, B,r) that Bronshtein studied 
is contained in C{[a,bY', B;T, . . . ,r). Also, it is easy to 
check that every function / in C (J^Joi, B; Fi, . . . , F^) 
is Lipschitz with respect to the Euclidean norm on Yli [o-i , 
with Lipschitz constant -^/Ff + • • • -I- F^. 

Note that for F,; = oo, the inequality (|2]i is satisfied 
by every function /. As a result, we have the equality 
C{[a,bY,B) = C{[a,bY;B; oo,...,oo). The following re- 
sult gives an upper bound for the e-covering number of 
C(n,;[ai7 bi]; B-jTi, . . . , Td) and is the main ingredient in the 
proof of Theorem 13.11 Its proof is similar to Bronshtein's 
proof t3j Proof of Theorem 6] of his upper bound on 
C{[a,bY,B,T) and is included in Section |IVl 

Theorem 3.2: There exist positive constants c and cq, de- 
pending only on the dimension d, such that for every positive 
BjTi, . . . jTd and rectangle [ai, &i] x • • • x [ad, bd], we have 

logAf (^C |^J|[a„6,];B;Fi,...,F<j^ , e; Loo^ 

\ d/2 



< C 



(3) 



for all < e < eo{B + J^ti ^^ib^ " 

Remark 3.1: Note that the right hand side of ^ equals oo 
unless Fi < oo for all z 1, . . . , d Thus, Theorem l3.2l is only 
meaningful when F^ < oo for all z = 1, . . . , d. 

Remark 3.2: Because C{[a,hY ,B,T) is contained in 
C([a, hY] B; Fi, . . . , Yd), Theorem 13.21 includes Bronshtein's 
upper bound on C{[a, bY, B, F) as a special case. Moreover, it 
gives explicit dependence of the upper bound on the constants 
a, b, B and F. Bronshtein did not state the dependence on these 
constants. 

We are now ready to prove Theorem 13.11 using Theo- 
rem 13.21 Here is the intuition behind the proof. The class 
C{[a,bY, B) can be thought of as an expansion of the class 
C{[a,bY; B-jTi, . . . jVd) formed by the removal of the d 
Lipschitz constraints Fi,...,F(i (or equivalently, by setting 
Fi = • • • = Fd = oo). Instead of removing all these 
d Lipschitz constraints at the same time, we remove them 
sequentially one at a time. This is formally accomplished by 
induction on the number of indices i for which F^ = oo. Each 
step of the induction argument focuses on the removal of one 
finite F^ and is thus like solving the one-dimensional problem. 
We consequently use Dryanov's ideas from ^ Theorem 3.1] 
to solve this quasi one-dimensional problem which allows us 
to complete the induction step. 

Proof of Theorem \3.1\ The scaling identity (HJ lets us 
take a = 0, & = 1 and B = 1. 
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We shall prove that there exist positive constants c and cq, 
depending only on d and p, such that for every Ti e (0, oo], 
we have 



logM(C([0,l]'*;l;ri,...,r<i);e;Lp 




for < e < eo. Note that this proves the theorem because we 
can set = oo for all i = 1, . . . ,d. Our proof will involve 
induction on I: the number of indices i for which = oo. 

For I = 0, i.e., when F^ < oo for all z = 1, . . . , d, (|4|l is a 
direct consequence of Theorem 13.21 In fact, in this case, (|4) 
also holds for p — oo. Suppose now that ^ holds for all I < k 
for some k e {1, . . . ,d}. We shall then verify it for I — k. Fix 
Ti e (0, oo] such that exactly k of them equal infinity. Without 
loss of generality, we assume that Fi = • • • = F/c = oo and 
Ti < oo for i > k. For every sufficiently small e > 0, we shall 
exhibit an e-cover of C([0, 1]''; 1; oo, . . . , oo, Ffe+i, . . . , F^) in 
the Lp-metric whose cardinality has logarithm bounded from 
above by a constant multiple of {J2i>k + 2)'^^^e^'^/^. Note 
that for k = d, the term X)i>fc equals zero. For convenience, 
let us denote the class C([0, 1]''; 1; oo, . . . , oo, F^+i, . . . , F^) 
by Q in the rest of this proof. 

Let 

-u :=exp(-2(p+l)2(p + 2)log2) and v -.^ 1 - u. (5) 

Fix 77 > and choose an integer A and Si, ... , 6a+i such that 

r]P = Si < ■ ■ ■ < Sa < u < Sa+1. 

For every two functions / and g on [0, 1]'', we can obviously 
decompose the integral J \ f — gl^ as 



[OA]'^ ^[0,u]x[0,l]''-i 



Ji,'u]x[0,l]''-i J[v.l]x[0,l]'^-^ 



Also, 



[0,ti]x[0,l]''-i J[0,<5i]x[0,l]''-i 



+ E/ 

For a fixed m — 1, . . . ,A, consider the problem of covering 
the functions in Q on the rectangular strip [Jmif^m+i] x 
[0,l]''-i. Clearly, 



l/-.9r = (Wi-U / (6) 
where, for x = (a;i, . . . , Xd) G [0, 1]'', 

f{x) := f{Srn + {Sm+l - Srn)xi,X2, . • -^Xd), 

and g{x) := g{Sm + {Sm+i ~ Srn)xi,X2, . . .,Xd). 

By convexity, the restriction of every function / in to 

[Sm,Sm+i] X [O,!]"*^^ belongs to the class: 

] X [0, 1]'^-^ 1; 2/Sm, 00, . . . , 00, Ffc+i, . . . , Fd) 



Consequently, the corresponding function / belongs to 

C([0, l]"*; 1; 2((5„j+i - (5„j)/5,„, 00, . . . ,oo,Ffe+i, . . . , F^). 

Because 2{Sm+i — Sm)/Sm < 00, we can use the induction 
hypothesis to assert the existence of positive constants eo 
and c, depending only on d and p, such that for every 
positive real number < eo^ there exists an a,„-cover of 
C{[0,l]^;l;2{S„,+i - 5™)/<5„., 00, . . . , 00, F^+i, . . . , F^)) in 
the Lu-metric on [0, ll'' of size smaller than 



exp ca 



-d/2 (9 I 2{S„i+i - Srn) 



d/2\ 



i>k 



< 



exp c 2 + E 



d/2 



i>k 



Sm+l 



d/2 ^ 



By covering the functions in Q by the constant function 
on [0,(5i] X [0,1]''^^ and up to am in the Lp-metric on 
[Sm, Sm+l] X [0, l]'^"^ for TO = 1, . . . , A, we obtain a cover of 
the restriction of the functions in Q to the set [0, u] x [0, l]''"^ 
in ip-metric having coverage 5*^^ and cardinality bounded 
from above by exp (52) where 

A 

Si := Si + E c^mi^ni+i - S,n) and 




Suppose now that 

Sm := 



p + iy^^ \ 

exp I p I log 77 and 



1] exp —p 



P + 2J 

{p + 



(p + ^Y' 

for TO = I, . . . , A + 1, where A is the largest integer such that 
exp p log 7] < u. 



p + 2 



Then, 



and 



5*1 = + E "™ i^m+l - Sm) 

m—1 

< Si+J2 <Sm + l = (1 + E 

m—1 \ m—1 / 



^^-^1 ^"^' ) Ect 

' ^ m=l 



where 

Cm 



rjS 



m+l 



exp 



2{p + iy ip + 2Y 



■log 77 
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Note that if 77 < 1, then log 77 < which implies Cm < 1- 
Also, for m — 2, . . . , A, we have 



^m — 1 



exp 



> exp 



exp 



> exp 



log 77 



p+ 1 

2(p+l)2(p + 2) Vp + 2 



-p log 77 



p+ 1 

2(p+l)2(p + 2) Vp + 2 

- log (^A 

,2(p+l)2(p + 2) 

— log w 
,2(p+l)2(p + 2) 



where we have used 6a < u and the fact that u has the 
expression (|5]l. Therefore Cm > 2C„i_i which can be rewritten 
as 

2r 



c < 



-^(C-C-i) for every r>l. 



Thus, 



ECm < C[ 



r?j.=2 



( Cm Cm — 1 ) 



1 



(2^C^-CD< 



2'' 



Using this for r = 2 and r = d, we deduce that 



7 2'^r 

S*! < -77P and S2 < -7 

3 ' 2'' - 1 



ci/2 



An exactly similar analysis can be done now to cover 
the restrictions of the functions in Q to the set [v, 1] x 
[0, l]'^"^ having the same coverage 5'^'' and same car- 
dinality bounded by exp(S'2). For [u,v] x [0,1]''^^, we 
note, by convexity, that the restrictions of functions in 
g to the set [u,v] x [0,1]'^"^ belong to C{[u,v] x 
[0, l]"*^^; 1; 2/m, cx), . . . , cx), Ffc+i, . . . , Frf). By the induction 
hypothesis, there exist constants c and eg, depending only on 
d and p, such that for all t] < eo, one can get a e-cover 
of C([m, v] X [0, l]''"^ 1; 2/ii, 00, . . . , 00, Ffc+i, . . . , F^) in the 
i J, -metric having cardinality smaller than 

exp ||c77-/^ (2 + ^+Er.y''j 



-2 \ <i/2 

<exp|c(-j 



d/2\ 



Observe that u only depends on p. By combining the covers of 
the restrictions of functions in Q to these three strips [0, 7i] x 

[0,1]'*"\ [u,v] X [0,1]'^"! and [w, 1] x [0, we obtain, 
for r] < €q, a. cover of Q in the ip-metric having coverage at 
most 

y 



7 7 



Z ' 3 
and cardinality at most 

exp I c 



77 



2^ - 1 



,d/2 



t>fc - 



c^/2^ 



By relabelling (17/3)^/''7; as e, we have proved that for e < 

(3/17)Vfeo, 



logAf(g;e;Lp) 



< c 



2d+i 2''/2\ /J2 



.d/2 



d/2 



2^ - 1 

This proves ^ for all Fi, . . . , F^ such that exactly k of them 
equal 00. The proof is complete by induction. ■ 
Remark 3.3: The argument used in the induction step above 
involved splitting the interval [0, 1] into the three intervals 
[0,M],[7t,w] and [v,l], and then subsequently splitting the 
interval [0,u] into smaller subintervals. We have borrowed 
this idea from Dryanov |l2l Proof of Theorem 3.1]. We must 
mention however that Dryanov uses a more elaborate argument 
to bound sums of the form 5*1 and 5*2 . Our way of controlling 
Si and 52 is much simpler which shortens the argument 
considerably. 

B. Lower bound for M{C{[a, b]'^, B), e; Lp) 

Theorem 3.3: There exist positive constants c and eo, de- 
pending only on the dimension d, such that for every p > 1, 
B > and 6 > a, we have 

logM{Ci[a,br,B),e;Lp)>c^-^^^^-^^ "\ 

for e < eoB(& - af/P. 

Proof: As before, by the scaling identity we take 
a — 0, b = 1 and B = 1. For functions defined on [0,1]'', 
the ip-metric, p > 1, is larger than Li. We will thus take 
p = 1 in the rest of this proof. We prove that for e sufficiently 
small, there exists an e-packing subset of C([0, 1]'*, 1), under 
the Li -metric, of cardinality larger than a constant multiple of 
g-(i/2 gy ^ packing subset of C([0, 1]'', 1), we mean a subset 
F satisfying ||/ — glli > e whenever f,gGF with / 7^ 5. 

Fix < 77 < 4(2 + y/d- 1)~2 and let k := fc(77) be the 
positive integer satisfying 

277-1/2 



^ < 
~ 2 + Vd~l 

Consider the intervals - 



<k + l <2k. 



(8) 

,fc. 



[u{i),v{i)] for i — 1 
such that 

1) < u{l) < v{l) < u{2) < v{2) < ■ ■ ■ < u{k) < 
v{k) < 1, 

2) v{i) — u{i) — y/rj, for i = 1, . . . , k, 

3) u{i + 1) - v{i) = l^/vid- 1) for i = 1,. . .,A; - 1. 
Let S denote the set of all d-dimensional cubes of the 

form X • • • X I{id) where ii, . . . ,id S {1, . . . , k}. The 

cardinality of S, denoted by \S\, is clearly fc*^. 

For each S £ S with S — x • • • x I{id) where = 
[u{ij), v{ij)], let us define the function hs : [0, 1]'' — > R as 

hs{x) = hs{xi,...,Xd) 
1 



1 



- Xj}, (9) 
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where /o(x) := 3(2:? + ^x}), for x e [0, l]'^. The 

functions hs,S ^ S have the following four key properties: 

1) hs is affine and hence convex. 

2) For every x S [0, l]'^, we have hs{x) < hsil, • ■ • , 1) < 
1. 

3) For every x ^ S, we have hs{x) > ,fo(x). This is 
because whenever x ^ S, we have u(ij) < Xj < v{ij) 
for each j, which implies {xj — u{ij)}{v{ij) — Xj} > 0. 

4) Let S,S' eS with S ^ S'. For every x € S\ we have 
hsix) < foix). To see this, let S' = x • • • x 
with = [u{i'^),v{i'^)]. Let x e S' and fix 1 < j < 
d. If = -^(^j-)' '^hen xj G — V''{ij):v{ij)\ and 
hence 



{Xj - U{ij)}{v{ij) - Xj} < 



{v{ij) - ujij)}'^ _ 77 



If ^ /(.;;■) and < v{i'^) < u{i,) < v{i,l 

then 



{xj - u{ij)}{v{ij) - Xj) 
<-{u{^,)~v{^'^Y 



cl-l 



The same above bound holds if u{ij) < v{ij) < u{i',j) < 
v{i'j). Because S ^ S', at least one of ij and i'j will be 
different. Consequently, 

3 

</o(^)+ E 7- E (rf-i)i</o(^)- 



Let {0,1}'^ denote the collection of all {0, l}-valued func- 
tions on S. The cardinality of {0, 1}'^ clearly equals 2l'^l 
(recall that \S\ = k"^). 

For each 9 e {0, let 



gg{x) := maxl max hs{x),fa{x) 

\S SS:9{S) — 1 

The first two properties of hs,S E S ensure that gg e 
C([0, 1]'*, 1). The last two properties imply that 

gg{x) = hs{x)d{S) + fo{x){l - 9{S)) for xeS. 

We now bound from below the Li distance between gg and 
ggi for 9,6 G {0, 1}'^. Because the interiors of the cubes in S 
are all disjoint, we can write 

> E / Igei^) - 99'{x)\dx 



J2 ms) + 9\s)-\ 



\hsix) - fo{x)\dx. 



xes 



Note that from (|9]l and by symmetry, the value of integral 

C:= / \hs{x) ~ fo{x)\dx 

is the same for all S E S. We have thus shown that 

llffe -59' 111 > CT(6', 6*') for all 6*, 6*' e {0, Ij-^, (10) 



where T{9,9') := J^sesi^i^) ^^'i^)} denotes the Ham- 
ming distance. 

The quantity can be computed in the following way. Let 

S = I{ii) X • • • X I{id) where — [u{ij), v{ij)]. We write 



Ju(ii) Ju(id) j = l 

By the change of variable yj — {xj — — 
for j = 1 , . . . , d, we get 

d ^ d 



Recalling that v{i) — u{i) — ^ for all z = 1, . . . , fc, we get 
C = r]'^/^rijd where 



Id ■■= 



f 1 



Note that 7^ is a constant that depends on the dimension d 
alone. Thus, from (fTOl i. we deduce 



\ge-ge'\\i>-/dV''^'vr{9,9') 



(11) 



for all 9, 9' e {0, 1}'^. We now use the Varshamov-Gilbert 
lemma (see e.g., ifTSl Lemma 4.7]) which asserts the existence 
of a subset W of {0, 1}^ with cardinaUty, \W\ > exp(|5|/8) 
such that T(t,t') > |5|/4 for all t,t' G W with r ^ t'. 
Thus, from (fTTl i and (O, we get that for every t,t' E W with 

||5. - ffs'lli > Idv'/'V^-^ = ^fv'^^vk" > 

where ci :— ^(2 + \/d— l)^"*. Taking e :— ciij, we have 
obtained for e < :— 4ci(2 + ^/d— 1)^^, an e-packing 
subset of C([0, 1]'', 1) of size M |1^| where 



logAf > M^^>(^ + ^)V/^ 
^"88" 8 ' 

d/2 



8(2 + 

where c depends only on the dimension d. This completes the 
proof. ■ 
Remark 3.4: The explicit packing subset constructed in the 
above proof consists of functions that can be viewed as 
perturbations of the quadratic function /q. Previous lower 
bounds on the covering numbers of convex functions in ||3] 
Proof of Theorem 6] and Q Section 2] (for d = 1) are based 
on perturbations of a function whose graph is a subset of 
a sphere; a more complicated convex function than /q. The 
perturbations of /o in the above proof can also be used to 
simplify the lower bound arguments in those papers. 

IV. Distances between convex functions, and 

THEIR epigraphs 

One of the aims of this section is to provide the proof 
of Theorem 13.21 Our strategy for the proof of Theorem 13.21 
is similar to Bronshtein's proof of the upper bound on 
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M{C{[a,b]'^,B,T),e;Loo)- The proof involves the following 
ingredients: 

1) An inequality between the Loo distance between two 
convex functions and the Hausdorff distance between 
their epigraphs. 

2) The result of Bronshtein [3 | for the covering numbers 
of convex sets in the Hausdorff metric. 

For a convex function / on [0, l]'* and B > Q, let us define 
the epigraph Vf{B) of / by 

Vf{B) := {{xi,...,Xd,Xd+i) : € [0,1]'* 

and /(xi , . . . , Xd) < Xd+i < B} . 

If f e C([0, l]"*, B), then clearly 



xl + ---+xl + x\^^ < 1 + 



1 + B' 



d + B^ 



for every (xi, . . . ,Xd+i) G Vf{B). Therefore, for every / e 
C([0, l]"*, B), its epigraph Vf{B) is contained in the {d + 1)- 
dimensional ball of radius y/d + B^ centered at the origin. 
The following inequality relates the Loo distance between two 
functions in C([0, l]"*; B-.Ti, . . . , Td) to the Hausdorff distance 
between their epigraphs. The Hausdorff distance between two 
compact, convex sets C and D in Euclidean space is defined 
by 



iniC, D) := max sup inf |x — yj, sup inf — y| ) , 

where | • | denotes Euclidean distance. 

Lemma 4.1: For every pair of functions / and g in 
C([0,l]'*;B;ri,...,rd), we have 



ll/-5l|oo<^//(^/(S),^9(S))/ 



Proof: We can clearly assume that < oo for all 
i = Fix f,g e C{[0,l]'^;B;Ti,...,Td) and let 

eH{VfiB),VgiB)) = p. Fix X e [0,1]'* with fix) ^ g{x). 
Suppose, without loss of generality, that f{x) < g{x). Now 
(x, fix)) e Vf{B) and because tniVfiB), Vg{B)) = p, there 
exists (x', 2/') S Vg{B) with |(x, /(x)) — (x', < p. Because 
f{x) < g{x), the point (x, /(x)) lies outside Vg{B) and using 
the convexity of Vg{B) we can take y' — g{x'). Therefore, 



< .g(x) - fix) 
= 9{x) - g{x') + .g(x') 
< |x — a 



^ d 



< 



1^ 



\9{x')-f{x)[^ 



r^ + i |(x,/(x))-(x',y')l 



1, 



where the second last inequality follows from the Cauchy- 
Scwarz (C-S) inequality. Lemma |4T| now follows because x e 
[0, l]"* is arbitrary in the above argument. ■ 
The proof of Theorem 13.21 given below, is based on 
Lemma 14.11 and the following result on covering numbers of 
convex sets proved in [3|. For F > 0, let /C'*+^(r) denote 
the set of all compact, convex subsets of the ball in M'*+^ of 



radius F centered at the origin. In Theorem 3 (and Remark 1) 
of f3l, Bronshtein proved that there exist positive constants c 
and Co, depending only on d, such that 

log MC/C^+i (F), e; ^h) <cf-j for e < Feo- (12) 

A more detailed account of Bronshtein's proof of (IT2t can be 
found in Section 8.4 of |4|. 

Proof of Theorem 15.21 The conclusion of the theorem 
is clearly only meaningful in the case when F^ < oo for all 
i — 1, . . . ,d. We therefore assume this in the rest of this proof. 

For every / e C {j\'l^^\ai,hi]] B]Ti, . . . ,Frf^, let us define 
the function / on [0, 1]** by 

/(ti, ...,td):=f (ai + {bi-ai)ti,...,ad + {hd - ad)td) , 

for ti,t2,.-.,td e [0,1]. Clearly the function / belongs 
to the class C ([0, 1]-*; S; Fi(&i - ai), . . . , Fd(6d - 0^)) and 
covering / to within e in the Loo-metric is equivalent to 
covering /. Thus 

M (^CiYlia,, b,];B; Fi, . . . , F^), e; L^c^ 

= M (C([0, 1]"*; B; Fi(6i - ai), . . . , Td{bd - ad)), e; ioo) • (13) 

We thus take, without loss of generality, = and hi ^ 1 
for alH = 1 , . . . , d. 

From Lemma 14.11 and the observation that Vf{B) e 
/C'*+i(Vd + S2) for all / e C([0, 1]'*, B), it follows that 

M {C{[Q,lY-B-T^,. . . ,Td),e-Loo) 



<M[K^-\Vd^^),^^j^^^^,in 

Thus from (fT2] |. we deduce the existence of two positive 
constants c and eo; depending only on d, such that 

logM(C([0,l]'*;B;Fi,...,Fd),e;Loo) 

< ^ , V(rf+B^)(i+rH-+rg) 



if e < eoyJ{d + B'^){l+T'i + ••• + F^). By the scaling 
inequality ( fT3l l. we obtain 

logM {c{\[[a^M]\B;Ti,..., Fd), e; Lo}j 



< c 



d/2 



if e < eo\/(rf + B^){1 + J^i ^ii^i - ^O^). By another scal- 
ing argument, it follows that 

M (^Cil[[a„ fe,]; Fi, . . . , F^), e; Loo^ 
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for every T > and, as a consequence, we get, for every 
T > 0, 

log M {c{\\[a^, 6,]; B; Ti, . . . , Yd), e; 



< c 



if e < eov/(dT2 + B2)(i + r-f{b, - a,)VT2). Choosing 
(by differentiation) 







we deduce finally 



log M{C{[a,b]'';B;Ti,..., Yd), e;L^) 

d/2 



< C 



if e < eo (b + v/f^EjTW^-OiF) ■ The proof of the 
theorem will now be complete by noting that 



The terms involving d can be absorbed in the constants c and 

eo. ■ 

One might wonder if a version of Lemma |4~2l can be proved 
for the Lp-metric instead of the Loo-metric, and without any 
Lipschitz constraints. Such an inequality would, in particular, 
yield an alternative simpler proof of Theorem 13. II It turns out 
that one can prove such a bound for the Li -metric but not for 
Lp for any p > 1. The inequality for ii is presented next. 
This inequality could possibly be of independent interest. The 
reason why such an inequality can not be proved for Lp,p > 1, 
is explained in Remark l4~n 

Lemma 4.2: For every pair of functions / and g in 
C([0, 1]'^,!), we have 



\\f-g\\i<{l + 2M)tH{Vf{l),Vg{l)). 



(14) 



Proof: For / e C([0,1]'',1) and x e (0,1)^ let m/(a;) 
denote any subgradient of the convex function / at x. Let 
lH{Vf{l),Vg{l)) = p > 0. Our first step is to observe that 

\f{x) - g{x)\ <p{l + \m}{x)\ + \m,{x)\) (15) 

for every x E (0,1)'*, where |m/(a;)| denotes the Euclidean 
norm of the subgradient vector mf{x) G M''. To see this, fix 
X e (0,1)'* with f{x) 7^ g{x). We assume, without loss of 
generality, that f(x) < g{x). Clearly {x,f{x)) e V/(l) and 
because €i/(V/(l), Vg(l)) p, there exists {x',y') e Vg{l) 
with \{x,f{x)) — {x',y')\ < p. Since f{x) < g{x), the point 
{x,f{x)) lies outside the convex set Vg{l) and we can thus 
take y' ~ g{x'). By the definition of the subgradient, we have 

9{x') > g{x) + {mg{x), x' - x) . 



Therefore, 

< gix) ~ fix) = gix) - g{x') + g{x') - f{x) 

< {mg{x),x-x') + \gix')-fix)\ 

< \ mgix)\\x-x'\ + \gix')-f{x)\ 

<^\mg{xW + l\{x,fix))~{x\y')\ 

< p^\mgix)\^ + l < p{l + \mg{x)\). 

Note that the Cauchy-Schwarz inequality has been used twice 
in the above chain of inequalities. We have thus shown that 
g{x) — f{x) < p(l + |mg(a;)|) in the case when f{x) < g{x). 
One would have a similar inequality in the case when f{x) > 
g{x). Combining these two, we obtain ( fTsl l. 
As a consequence of ( fTSI l, we get 



[0.1]''\[p.l-p]'' 



\f-9\ 



I/- 



< 2 (1 - (1 - 2p)'^) +pll + [ \mf{x)\dx 
\ JipA-pV 



< pi 1 + 4d 



\mg{x)\dx 
[p^-p]" J 

{\mf{x)\ + \mg{x)\}dx 



where we have used the inequality (1 — 2pY > 1 — Sdp. 
To complete the proof of (fl4] |. we show that 

/[P4-P]- ^ 8^ for eveiy / G C([0, 1]'*, 1). 
We write mf{x) — {■mf{x){l), . . . ,mf{x){d)) G K'* and 
use the definition of the subgradient to note that for every 

X E [p, 1 — p]'* and I < i < d, 



f{x + tci) - f{x) > t mf(x){i) 



(16) 



for t > Q sufficiently small, where is the unit vector in 
the ith coordinate direction i.e., e,;(j) := 1 \f i = j and 
otherwise. Dividing both sides by t and letting t | 0, we 
would get mf{x){i) < f'{x;ei) (we use f'{x;v) to denote 
the directional derivative of / in the direction v; directional 
derivatives exist as / is convex). Using ( fTSI l for t < 0, we get 
mf{x){i) > — f {x; —Ei). Combining these two inequalities, 
we get 



\mfixm\ < \f'{x;e,)\ + \f'{x;-e,)\ 
As a result, 

\mf{x)\dx 



for i ~ 1, . . . , d. 



'IpA-pV 

d ^ 

^ E / \^fix)i^)\dx 
,^1 JIpA-p]" 

< El/ \f'ix;e,)\dx+ [ \fix;-e,)\dx 

\Jlpj-p]-^ Jlp^-p]" ) 



We now show that for each i, both the integrals 

l\p.\-pY\f^^'^~^^)\ bounded 
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from above by 4. Assume, without loss of generality, that i = 1 
and notice 



\f'{x; ei)\dx 

'[p.i-p]" 

(f ' \f{{xi,u);ei)\dxi]du. (17) 

We fix u = {x2, ■ ■ ■ , Xd) e [p, 1 — and focus on the inner 
integral. Let v{z) := f{z,X2, ■ ■ ■ ,Xd) for z e [0, 1]. Clearly v 
is a convex function on [0, 1] and its right derivative, ^^(^^i) 
at the point z = xi E (0,1) equals f'{x;ei) where x = 
{xi, . . . ,Xd)- The inner integral thus equals \v'j.{z)\dz. 
Because of the convexity of v, its right derivative v'^{z) is 
non-decreasing and satisfies 

v{y2) - v{yi) = / vl{z)dz for < yi < ?/2 < 1- 
Jyi 

Consequently, 



i-p 



\v'j.{z)\dz 



< sup I — / vl.{z)dz + / vl,{z)dz 

p<c<l-p \ Jp Jc 

= sup {v{p) + v{l - p) - 2u(c)) . 

p<c<l— p 

The function v{-) clearly satisfies \v{z)\ < 1 because / G 
C([0, 1]'^,1). This implies that J^^'' \v'^{z)\dz < 4. The 
identity ( [TtI i therefore gives 



[p^-p]" 



\f'{x; ei)\dx 



(X2 



I / \vl.{z)\dz \ dx2 ■ ■ ■ dxd < 4:. 

...,xa}e[p,l-p]''-^ \Jp J 



Similarly, by working with left derivatives of v as opposed to 
right, we can prove that 



|/'(x;-ei)Ma;<4. 

'[p.i-p]" 

Therefore, the integral /j^ \-pY \'^s\ rnost 8d because it 
is less than or equal to 

El/ \nx;e,)\dx+ [ \fix;-e,)\dx]. 

yjips^p]" Apa-pv J 

This completes the proof of Lemma 14.21 ■ 
Remark 4.1: Lemma|42]is not true if Li is replaced by Lp, 
for p > 1. Indeed, if d = 1 and fa{x) :— niax(0, 1 — (x/a)) 
for < a < 1 and g{x) :— for all x G [0, 1], then it can be 
easily checked that for 1 < p < oo. 



\\U~g\\p 



As a can be arbitrarily close to zero, this clearly rules out 
any inequality of the form ( fT4] i with the Li -metric replaced 
by Lp, for 1 < p < oo. 

Remark 4.2: Lemma 14.21 and Bronshtein's result (fTZt can 
be used to give an alternative proof of Theorem 13.11 for the 



special case p — 1. Indeed, the scaUng identity ([TJ lets us take 
a = 0, b — 1 and B = 1. Inequality ( fT4] i implies that the 
covering number M (C([0, 1]'*, 1), e; Li) is less than or equal 
to 



M(^IC''+WdTT)r 



2{l + 20d)'' 

Thus from ( fT2l i. we deduce the existence of two positive 
constants c and cq, depending only on d, such that 

logM(C([0,l]^l),e;Li) <ce-'^/2 



whenever e < ep. Note that, by Remark \4~1\ this method of 
proof does not work in the case of Lp, for 1 < p < oo. 
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